February 16, 2021

Gene expression

I start the work with the data by finding the differentially expressed genes.

To do so, I perform the tests for comparison of the means of control and disease groups, starting with simple two sample for t-test. For each gene I also check the variances equality before comparing the groups’ means.

P-values from t-test: before and after correction

Gene expression - corrected test scheme

The number of differentiated genes proved to be really high, therefore I check whether the assumption on the normality of the distribution does not hinder the results by applying new test scheme:

  • Check the normality of distribution of both groups.
  • If it is normal, perform t-test, checking the equality of variances beforehand.
  • If it is not a normal distribution, perform Mann-Whitney test.
  • Correct obtained p-values with Benjamini & Hochberg method for multiple testing.

P-values after distribution consideration

Distribution effect

I compare the result of taking the distribution into consideration with the previous assumption.

Enrichment analysis

After getting gene differentiation, I proceed with enrichment analysis. I will start with ORA, then proceed into FCS methods.

ORA

##                                                 Title corrected_pvals
## 100                                     RNA transport    4.259285e-08
## 133                                        Cell cycle    4.259285e-08
## 305                               MicroRNAs in cancer    1.182302e-07
## 260                                 Alzheimer disease    1.960257e-06
## 265                                     Prion disease    1.960257e-06
## 295           Human T-cell leukemia virus 1 infection    1.960257e-06
## 300                                Pathways in cancer    2.529952e-06
## 88                                 Metabolic pathways    4.283633e-06
## 263                                Huntington disease    4.283633e-06
## 266 Pathways of neurodegeneration - multiple diseases    4.546480e-06
## 115                            Fanconi anemia pathway    6.612935e-06
## 170                                    Focal adhesion    6.612935e-06
## 304                           Proteoglycans in cancer    6.612935e-06
## 262                     Amyotrophic lateral sclerosis    7.685330e-06
## 101                         mRNA surveillance pathway    2.554206e-05

CERNO

##                                       Title corrected_pvals
## 88                       Metabolic pathways    2.368986e-08
## 133                              Cell cycle    2.368986e-08
## 300                      Pathways in cancer    2.368986e-08
## 295 Human T-cell leukemia virus 1 infection    4.941884e-08
## 170                          Focal adhesion    1.231707e-07
## 305                     MicroRNAs in cancer    5.131701e-07
## 159      Vascular smooth muscle contraction    5.800451e-07
## 116                  MAPK signaling pathway    2.139096e-06
## 119                  Rap1 signaling pathway    2.139096e-06
## 125             Chemokine signaling pathway    4.895844e-06
## 304                 Proteoglycans in cancer    4.895844e-06
## 148              PI3K-Akt signaling pathway    5.527751e-06
## 144                             Endocytosis    5.931361e-06
## 177     Complement and coagulation cascades    5.931361e-06
## 121              cGMP-PKG signaling pathway    7.034663e-06

Z-transform

##                                                 Title corrected_pvals
## 88                                 Metabolic pathways    9.001854e-28
## 300                                Pathways in cancer    1.231452e-22
## 266 Pathways of neurodegeneration - multiple diseases    9.693740e-16
## 295           Human T-cell leukemia virus 1 infection    1.234043e-15
## 148                        PI3K-Akt signaling pathway    2.047843e-14
## 116                            MAPK signaling pathway    6.369432e-14
## 133                                        Cell cycle    6.369432e-14
## 170                                    Focal adhesion    6.369432e-14
## 260                                 Alzheimer disease    2.115722e-13
## 305                               MicroRNAs in cancer    3.002434e-13
## 304                           Proteoglycans in cancer    4.702153e-13
## 265                                     Prion disease    3.590622e-12
## 263                                Huntington disease    4.907427e-12
## 119                            Rap1 signaling pathway    6.906820e-12
## 294                    Human papillomavirus infection    5.373930e-11

GSEA implementation

Signal to noise absolute - p-values

We can see the matlab output is definitely strange.

Signal to noise absolute - ES

## P-value:  4.997284e-42
## Correlation coefficient:  -0.6518573

Signal to noise - p-values

Signal to noise - ES

## P-value:  0.261582
## Correlation coefficient:  0.06141717

LFC absolute - p-values

LFC absolute - ES

## P-value:  2.1936e-26
## Correlation coefficient:  -0.5360192

LFC - p-values

LFC - ES

## P-value:  0.007659464
## Correlation coefficient:  0.1452507

PLAGE

##                                                    Title corrected_pvals
## 56                        Glycerophospholipid metabolism    9.275123e-53
## 171                             ECM-receptor interaction    9.405913e-53
## 238                                   Insulin resistance    1.147170e-52
## 233                            Relaxin signaling pathway    8.115912e-52
## 240 AGE-RAGE signaling pathway in diabetic complications    9.085146e-52
## 88                                    Metabolic pathways    5.763372e-51
## 148                           PI3K-Akt signaling pathway    6.818963e-51
## 27                       Arginine and proline metabolism    1.392011e-50
## 254                     Protein digestion and absorption    1.478016e-50
## 305                                  MicroRNAs in cancer    1.478016e-50
## 235  Parathyroid hormone synthesis, secretion and action    1.543308e-50
## 173                                    Adherens junction    2.323504e-50
## 273                            Vibrio cholerae infection    2.323504e-50
## 144                                          Endocytosis    3.092973e-50
## 166                             Apelin signaling pathway    3.250911e-50

log P-values correlation

Results comparison - add gsea

##                    ORA CERNO Z_transform PLAGE
## enriched gene sets 114   200         263   338

Joint gene sets

## Number of joint enriched gene sets:  111

Combining p-values

##                                                      Title pval_combined
## hsa04022                        cGMP-PKG signaling pathway  0.000000e+00
## hsa05206                               MicroRNAs in cancer  0.000000e+00
## hsa05200                                Pathways in cancer  2.036205e-39
## hsa01100                                Metabolic pathways  6.959628e-39
## hsa04110                                        Cell cycle  3.834775e-34
## hsa04510                                    Focal adhesion  4.838081e-34
## hsa05166           Human T-cell leukemia virus 1 infection  2.440812e-32
## hsa05205                           Proteoglycans in cancer  4.863737e-32
## hsa05022 Pathways of neurodegeneration - multiple diseases  8.707401e-31
## hsa04151                        PI3K-Akt signaling pathway  8.386346e-30
## hsa04010                            MAPK signaling pathway  8.529154e-30
## hsa05010                                 Alzheimer disease  1.119963e-29
## hsa05020                                     Prion disease  3.556770e-29
## hsa04360                                     Axon guidance  6.058880e-29
## hsa04015                            Rap1 signaling pathway  6.865159e-29

Visualizations